AITopics | video and audio

Collaborating Authors

video and audio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

UniForm: A Unified Diffusion Transformer for Audio-Video Generation

Zhao, Lei, Feng, Linfeng, Ge, Dongxu, Yi, Fangqiu, Zhang, Chi, Zhang, Xiao-Lei, Li, Xuelong

arXiv.org Artificial IntelligenceFeb-8-2025

As a natural multimodal content, audible video delivers an immersive sensory experience. Consequently, audio-video generation systems have substantial potential. However, existing diffusion-based studies mainly employ relatively independent modules for generating each modality, which lack exploration of shared-weight generative modules. This approach may under-use the intrinsic correlations between audio and visual modalities, potentially resulting in sub-optimal generation quality. To address this, we propose UniForm, a unified diffusion transformer designed to enhance cross-modal consistency. By concatenating auditory and visual information, UniForm learns to generate audio and video simultaneously within a unified latent space, facilitating the creation of high-quality and well-aligned audio-visual pairs. Extensive experiments demonstrate the superior performance of our method in joint audio-video generation, audio-guided video generation, and video-guided audio generation tasks. Our demos are available at https://uniform-t2av.github.io/.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.03897

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation

Ishii, Masato, Hayakawa, Akio, Shibuya, Takashi, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceNov-15-2024

In this work, we build a simple but strong baseline for sounding video generation. Given base diffusion models for audio and video, we integrate them with additional modules into a single model and train it to make the model jointly generate audio and video. To enhance alignment between audio-video pairs, we introduce two novel mechanisms in our model. The first one is timestep adjustment, which provides different timestep information to each base model. It is designed to align how samples are generated along with timesteps across modalities. The second one is a new design of the additional modules, termed Cross-Modal Conditioning as Positional Encoding (CMC-PE). In CMC-PE, cross-modal information is embedded as if it represents temporal position information, and the embeddings are fed into the model like positional encoding. Compared with the popular crossattention mechanism, CMC-PE provides a better inductive bias for temporal alignment in the generated data. Experimental results validate the effectiveness of the two newly introduced mechanisms and also demonstrate that our method outperforms existing methods. Diffusion models have made great strides in the last few years in various generation tasks across modalities including image, video, and audio (Yang et al., 2023).

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2409.1755

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Meta plans to ramp up labeling of AI-generated images across its platforms

EngadgetFeb-6-2024, 16:02:34 GMT

Meta plans to ramp up its labeling of AI-generated images across Facebook, Instagram and Threads to help make it clear that the visuals are artificial. It's part of a broader push to tamp down misinformation and disinformation, which is particularly significant as we wrangle with the ramifications of generative AI (GAI) in a major election year in the US and other countries. According to Meta's president of global affairs, Nick Clegg, the company has been working with partners from across the industry to develop standards that include signifiers that an image, video or audio clip has been generated using AI. "Being able to detect these signals will make it possible for us to label AI-generated images that users post to Facebook, Instagram and Threads," Clegg wrote in a Meta Newsroom post. "We're building this capability now, and in the coming months we'll start applying labels in all languages supported by each app."

artificial intelligence, machine learning, meta, (16 more...)

Engadget

Country: North America > United States (0.71)

Industry:

Media > News (0.91)
Government > Regional Government > North America Government > United States Government (0.36)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

Yang, Ruihan, Gamper, Hannes, Braun, Sebastian

arXiv.org Artificial IntelligenceDec-8-2023

We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio. Recognizing the importance of accurate alignment between video and audio events in multi-modal generation tasks, we propose a joint contrastive training loss to enhance the synchronization between visual and auditory occurrences. Our research methodology involves conducting comprehensive experiments on multiple datasets to thoroughly evaluate the efficacy of our proposed model. The assessment of generation quality and alignment performance is carried out from various angles, encompassing both objective and subjective metrics. Our findings demonstrate that the proposed model outperforms the baseline, substantiating its effectiveness and efficiency. Notably, the incorporation of the contrastive loss results in improvements in audio-visual alignment, particularly in the high-correlation video-to-audio generation task. These results indicate the potential of our proposed model as a robust solution for improving the quality and alignment of multi-modal generation, thereby contributing to the advancement of video and audio conditional generation systems.

alignment, diffusion model, video, (11 more...)

arXiv.org Artificial Intelligence

2312.05412

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback

We Haven't Seen the Worst of Fake News

The Atlantic - TechnologyDec-20-2022, 16:31:00 GMT

It was 2018, and the world as we knew it--or rather, how we knew it--teetered on a precipice. Against a rising drone of misinformation, The New York Times, the BBC, Good Morning America, and just about everyone else sounded the alarm over a new strain of fake but highly realistic videos. Using artificial intelligence, bad actors could manipulate someone's voice and face in recorded footage almost like a virtual puppet and pass the product off as real. In a famous example engineered by BuzzFeed, Barack Obama seemed to say, "President Trump is a total and complete dipshit." Synthetic photos, audio, and videos, collectively dubbed "deepfakes," threatened to destabilize society and push us into a full-blown "infocalypse."

artificial intelligence, deepfake, machine learning, (19 more...)

The Atlantic - Technology

Country:

Asia > North Korea (0.29)
Asia > Russia (0.14)
North America > United States > New York (0.04)
(5 more...)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.87)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

La veille de la cybersécurité

#artificialintelligenceAug-24-2022, 00:11:07 GMT

Is that really Tom Cruise about to wrestle an alligator? Keanu Reeves dancing like nobody is watching? Deepfake technology is advanced artificial intelligence that replaces actual video and audio with video and audio that was artificially created from other sources. While it may look like harmless fun on TikTok, it's also becoming a huge security risk for businesses of all sizes. According to a just released report from the cloud service firm VMware, deepfake attacks are on the rise.

deepfake, veille, video and audio, (1 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Social Media (0.94)

Add feedback

Are these guys for real? How to keep your business safe from deepfakes

The GuardianAug-21-2022, 11:00:07 GMT

business safe, deepfake, deepfake technology, (8 more...)

The Guardian

Country:

Europe > United Kingdom (0.06)
Asia > China > Hong Kong (0.06)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)

Add feedback

Death, resurrection and digital immortality in an AI world

#artificialintelligenceAug-21-2022, 03:10:36 GMT

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! I have been thinking about death lately. Possibly because I recently had a month-long bout of Covid-19. And, I read a recent story about the passing of the actor Ed Asner, famous for his role as Lou Grant in "The Mary Tyler Moore Show."

datadecisionmaker, information, resurrection and digital immortality, (13 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.15)

Genre: Personal > Obituary (0.35)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)

Add feedback

The impact of deepfakes: How do you know when a video is real?

#artificialintelligenceAug-1-2022, 06:04:39 GMT

In a world where seeing is increasingly no longer believing, experts are warning that society must take a multi-pronged approach to combat the potential harms of computer-generated media. As Bill Whitaker reports this week on 60 Minutes, artificial intelligence can manipulate faces and voices to make it look like someone said something they never said. The result is videos of things that never happened, called "deepfakes." Often, they look so real, people watching can't tell. Even Justin Bieber has been tricked by a series of deepfake videos on the social media video platform TikTok that appeared to be of Tom Cruise.

deepfake, schick, video, (13 more...)

#artificialintelligence

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > California (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Media > News (0.71)
Law (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Communications > Social Media (0.91)

Add feedback

The impact of deepfakes: How do you know when a video is real?

#artificialintelligenceOct-11-2021, 18:05:33 GMT

In a world where seeing is increasingly no longer believing, experts are warning that society must take a multi-pronged approach to combat the potential harms of computer-generated media. As Bill Whitaker reports this week on 60 Minutes, artificial intelligence can manipulate faces and voices to make it look like someone said something they never said. The result is videos of things that never happened, called "deepfakes." Often, they look so real, people watching can't tell. Just this month, Justin Bieber was tricked by a series of deepfake videos on the social media video platform TikTok that appeared to be of Tom Cruise.

deepfake, schick, video, (13 more...)

#artificialintelligence

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > California (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Media > News (0.71)
Law (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Communications > Social Media (0.91)

Add feedback